Skip to content

FEAT: Tool Use + MCP#1811

Open
ValbuenaVC wants to merge 21 commits into
microsoft:mainfrom
ValbuenaVC:MCP
Open

FEAT: Tool Use + MCP#1811
ValbuenaVC wants to merge 21 commits into
microsoft:mainfrom
ValbuenaVC:MCP

Conversation

@ValbuenaVC
Copy link
Copy Markdown
Contributor

@ValbuenaVC ValbuenaVC commented May 26, 2026

Description

PyRIT's existing tool-calling story is fragmented:

  • OpenAIChatTarget parses tool_calls into function_call pieces and stops — no execution, no loop.
  • OpenAIResponseTarget hand-rolls a complete agentic loop inside _send_prompt_to_target_async, accepts a custom_functions registry of Python callables, and dispatches one tool call per turn.
  • MCP servers are not a recognized concept.

This PR introduces a single, target-agnostic tool-use primitive that any PromptTarget subclass can opt into:

  • New pyrit/tools/ package with a tool_loop decorator wired onto PromptTarget.send_prompt_async, a ToolCallParser protocol (per-target detection), and a ToolBackend ABC with two concrete backends — LocalToolBackend (in-process Python callables) and MCPToolBackend (stdio MCP servers via the official mcp SDK).
  • New TargetCapabilities.supports_tool_use capability flag plus ToolEventPolicy (EXECUTE / RAISE / RETURN_RAW) and a tool_backend slot on TargetConfiguration. The policy lets red-team callers observe attempted tool use without executing it, or hand the raw response back untouched.
  • OpenAIResponseTarget migrated onto the decorator. The in-class agentic loop is gone; _send_prompt_to_target_async returns exactly one Message per call and the decorator stitches multi-turn iterations into the response list. Multiple tool calls in a single turn are now dispatched all-at-once sequentially — the protocol-intended behavior.
  • An InlineToolCallParser that walks text pieces for marker-delimited JSON blocks (configurable regex; defaults to angle-bracket syntax). Non-OpenAI deployments that emit tool calls inline in generated text can opt in by supplying this as _tool_parser.
  • AzureMLChatTarget and HuggingFaceChatTarget gain optional tool_parser and tool_backend constructor kwargs that opt them into the decorator without subclassing. Supplying a parser flips supports_tool_use=True on the default capabilities so callers don't need a custom_configuration just to enable tool use. The two targets use different wire-format wrappings (AzureMLChatTarget wraps schemas in the OpenAI Chat Completions {"type":"function","function":{...}} envelope; HuggingFaceChatTarget passes bare schemas straight into tokenizer.apply_chat_template).
  • ChatMessageNormalizer now serializes function_call and function_call_output pieces into the OpenAI Chat Completions wire shape (assistant message with tool_calls; role="tool" message with tool_call_id). This is what makes the chat-completions-shaped targets above able to round-trip tool conversations through @tool_loop without target-side translation code.
  • custom_functions kwarg on OpenAIResponseTarget is deprecated (removed_in="0.16.0"); internally rewrapped as a LocalToolBackend so the legacy path keeps working through one release cycle.

OpenAIChatTarget is intentionally left as-is. The Responses API is the modern agentic surface for OpenAI; new tool-calling investment there would age poorly. Targets that need tool calling for non-Responses-API endpoints opt into the decorator by supplying a parser and a backend.

Future MCP transports (HTTP/SSE, Docker sandbox), additional sandbox providers, and streaming all plug in behind the existing ToolBackend / MCPServerSpec interfaces with no abstraction changes. The MCPServerSpec union ships with three variants: LocalMCPServerSpec (the only one with a working transport) plus stub declarations of RemoteMCPServerSpec and DockerMCPServerSpec whose connect_async raises NotImplementedError. Future PRs implement an already-declared variant rather than expanding the union.

Tracks deferred work via TODOs marked # TODO(streaming-v2), # TODO(mcp-http-transport), # TODO(mcp-resources), and # TODO(sandbox-provider).

Compatibility

This PR is not breaking for the standard tool-calling path. Compatibility caveats reviewers should know about:

  1. Source-compat — PromptTarget.send_prompt_async is @final. External subclasses that override the public entrypoint (not just _send_prompt_to_target_async) will fail to import. No in-tree target overrides it today.
  2. Deprecation — OpenAIResponseTarget(custom_functions=...). The kwarg now emits DeprecationWarning(removed_in="0.16.0") and is internally rewrapped as a LocalToolBackend. No runtime behavior change in the current release cycle.
  3. Intentional behavior change — multi-call-per-turn dispatch on the Response target. When a model response contains N>1 tool calls, the new loop dispatches all N sequentially in declaration order. The previous hand-rolled loop only dispatched the last call per turn. This is strictly more dispatching, not less, so it cannot regress any working code; it matches the OpenAI protocol's actual intent.
  4. Private API removal on OpenAIResponseTarget. _find_last_pending_tool_call, _execute_call_section, and _make_tool_piece are no longer called from production code. Listed for changelog completeness — these were always private.

Tests and Documentation

  • New tests/unit/tools/ directory covering the decorator, parsing, LocalToolBackend, MCPClient (real stdio subprocess against a deterministic FastMCP fixture), and MCPToolBackend (multi-server routing, name-collision detection, name_prefix disambiguation, allowed_tools filtering, and concurrent-dispatch serialization).
  • New tests/unit/prompt_target/common/test_prompt_target_tool_loop.py asserting decorator order-of-execution against a fake target and using patch_central_database to verify per-message insert ordering, per-role labeling (assistant, tool), and per-data-type labeling (function_call, function_call_output) against the actual DB schema.
  • New tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py covering the migration onto @tool_loop, the deprecation warning on custom_functions, schema injection into request bodies, extra_body_parameters["tools"] precedence, and multi-call-per-turn sequential dispatch.
  • Existing test_openai_response_target_function_chaining.py sentinel tests pass unchanged: the back-compat property on _custom_functions keeps in-place mutations working.
  • New tests/unit/tools/test_inline_parser.py covering InlineToolCallParser across marker syntaxes (angle-bracket, pipe-delimited tag pair, square-bracket list payload), mode coverage (TRUNCATE_AT_LAST / TRUNCATE_AT_FIRST / EXTRACT_ALL / STRICT_TRAILING_EMPTY), and edge cases (empty input, malformed JSON, missing name field, multi-piece messages).
  • Extended tests/unit/prompt_target/target/test_azure_ml_chat_target.py and tests/unit/prompt_target/target/test_huggingface_chat_target.py with coverage for the new tool_parser / tool_backend kwargs: capability flipping, backend installation, request-body shape, no-tools backward compatibility, and (for the AzureML side) response materialization into function_call pieces.
  • Extended tests/unit/message_normalizer/test_chat_message_normalizer.py with full round-trip coverage of tool-piece serialization (function_call → assistant tool_calls, function_call_outputrole=tool with tool_call_id).
  • New tests/integration/tools/test_red_teaming_with_tools.py running the real RedTeamingAttack against OpenAIResponseTarget with only the HTTP layer mocked. Tools are served by the real echo_mcp_server subprocess; the MCP stdio subprocess, AsyncExitStack lifecycle, canonical envelope round-trip, and RedTeamingAttack execution path all run unmocked.
  • New tests/integration/tools/test_azure_ml_with_tools_integration.py exercising the full PyRIT @tool_loop stack against AzureMLChatTarget with only the HTTP layer mocked. Asserts the canonical four-piece transcript (user → assistant function_call → tool function_call_output → assistant text) lands in Memory with matching call_id round-tripping.
  • No notebook/doc additions in this PR — follow-up scenarios PR will exercise the public API.

JupyText: not applicable (no notebook changes).

Victor Valbuena and others added 4 commits May 26, 2026 13:59
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ValbuenaVC ValbuenaVC changed the title [DRAFT] FEAT: Tool Use via MCP [DRAFT] FEAT: Tool Use + MCP May 26, 2026
Victor Valbuena and others added 3 commits May 26, 2026 15:56
… into PromptTarget.send_prompt_async

C4 lands the in-tree wiring for the generic tool-use loop introduced by C2/C3:

- TargetCapabilities gains supports_tool_use: bool (default False) and
  CapabilityName.TOOL_USE for the corresponding enum value, matching the
  existing supports_X / "supports_X" naming convention used by every
  other capability.
- TargetConfiguration grows tool_event_policy + tool_backend kwargs,
  both gettable/settable properties. The setter (and constructor)
  validate that a non-None tool_backend requires supports_tool_use=True;
  otherwise they raise ValueError immediately. ToolBackend /
  ToolEventPolicy imports are quoted + behind TYPE_CHECKING to keep
  pyrit.prompt_target.common from importing pyrit.tools eagerly.
- PromptTarget.send_prompt_async picks up @tool_loop (below the existing
  @Final). The wrapper is a no-op when tool_event_policy is None, so
  every existing target keeps its current behavior. _tool_parser
  (property, default None) and _tool_schemas() (default []) are added
  on the base class as the two collaborators @tool_loop reads.
- _permissive_configuration is updated to flip supports_tool_use=True
  alongside the other supports_X flags so the all-flags-on probe loop
  in test_discover_target_capabilities still sees every CapabilityName
  value as supported.

tests/unit/tools/conftest.py drops the hand-decorated @tool_loop on
_FakeToolTarget.send_prompt_async (which would now violate the base
class's @Final) and instead wires policy + backend through
TargetConfiguration. _tool_parser becomes a subclass property since
the base class now defines one.

Tests:
- test_tool_event_policy.py adds U7 (capability flag wiring through the
  wrapper) plus dataclass field defaults and the TargetConfiguration
  validator.
- test_prompt_target_tool_loop.py adds U1 / U2 (DB-end) / U8 / U9 / U11
  exercised against a _ProductionShapedTarget that uses the real
  base-class _get_normalized_conversation_async (memory round-trip via
  patch_central_database). Plus default-_tool_parser / -_tool_schemas
  assertions.

Validation: 8104 unit tests pass; pre-commit clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a generic, target-agnostic tool-use primitive: a new pyrit/tools/ package with a tool_loop decorator (applied to PromptTarget.send_prompt_async), ToolBackend ABC with LocalToolBackend + MCPToolBackend (stdio) implementations, MCP client/server-spec types, ToolCallParser protocol, a new TargetCapabilities.supports_tool_use flag, and TargetConfiguration.tool_event_policy / tool_backend fields. Two new exception classes (ToolCallNotSupported, ToolCallLoopLimitExceeded) carry partial-conversation state. The PR is marked DRAFT; the OpenAI target migrations described in the PR description are not yet present in the diff.

Changes:

  • New pyrit.tools package (ToolCall, tool_loop, backends, MCP client/specs, parsers)
  • Base PromptTarget.send_prompt_async made @final @tool_loop, with default no-op _tool_parser / _tool_schemas; capability + configuration fields added
  • mcp>=1.0,<2 added as a core (non-optional) dependency; new unit tests against a real FastMCP stdio subprocess

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pyrit/tools/init.py Public re-exports for the new tools package
pyrit/tools/models.py ToolCall, ToolEventPolicy, tool_loop decorator core
pyrit/tools/backend.py ToolBackend ABC with default sequential dispatch
pyrit/tools/local_backend.py In-process callable backend with error envelopes
pyrit/tools/parsers.py ToolCallParser protocol + canonical filter helper
pyrit/tools/mcp_client.py Stdio MCPClient, three MCPServerSpec variants (only Local implemented)
pyrit/tools/mcp_backend.py Multi-server routing, name-prefixing, allow-listing
pyrit/prompt_target/common/prompt_target.py Apply @tool_loop to send_prompt_async; default tool hooks
pyrit/prompt_target/common/target_capabilities.py New TOOL_USE capability and supports_tool_use flag
pyrit/prompt_target/common/target_configuration.py New tool_event_policy / tool_backend fields + validators
pyrit/prompt_target/common/discover_target_capabilities.py Permissive profile enables supports_tool_use
pyrit/exceptions/exception_classes.py, init.py ToolCallNotSupported, ToolCallLoopLimitExceeded
pyproject.toml, uv.lock mcp>=1.0,<2 added as a core dependency + transitive deps
tests/unit/tools/* Decorator, policy wiring, local backend, MCP client/backend, real stdio echo server fixture

Comment thread tests/unit/tools/test_tool_event_policy.py Outdated
Comment thread pyproject.toml
Comment thread tests/unit/tools/test_mcp_client.py Outdated
Comment thread pyrit/prompt_target/common/target_configuration.py
Comment thread pyrit/tools/__init__.py Outdated
Comment thread pyrit/tools/__init__.py Outdated
Comment thread pyrit/tools/models.py
Comment thread pyrit/tools/mcp_backend.py
Comment thread pyrit/prompt_target/common/prompt_target.py
ValbuenaVC and others added 8 commits May 28, 2026 12:10
…nd Response target

This commit is intentionally empty. It records a scope decision made in
response to PR review feedback. No code changes - the C5 working set was
uncommitted and has been reverted.

# Why we're dropping C5

Review feedback raised two concerns the original C5 did not address:

  1. **Duplication against OpenAIResponseTarget.** The Response target
     already implements an agentic tool loop (openai_response_target.py
     lines 590-626), the canonical function_call envelope (lines 666-674),
     a Python-callable dispatch registry (custom_functions), and an
     allow-list-ish hook (fail_on_missing_function). C5 layered a parallel
     implementation on top for the Chat target instead of converging both
     targets onto one stack.
  2. **Chat Completions is on its way out.** OpenAI has publicly framed
     the Responses API as the long-term replacement for Chat Completions.
     Investing in tool-call plumbing for a deprecated endpoint ages out
     fast and obscures the actual value of this PR.

The right framing is: this PR is not "tool calling for all targets." It
is "pluggable tool-execution backends + a client-side agentic loop for
non-Responses-API targets." The Responses API is one transport; this PR
is the in-process abstraction that works for every transport.

# What survives unchanged

C1 (mcp SDK dep), C2 (tools/ scaffold + LocalToolBackend), C3 (MCPClient
+ MCPToolBackend + Docker stub), and C4 (capability flag + @tool_loop
wired on the base class) all remain shipped. The genuinely-novel work -
local stdio MCP, pluggable backend ABC, ToolEventPolicy (RAISE /
EXECUTE / RETURN_RAW), allowed_tools - is unaffected.

# The new design

**One agentic loop driver.** The @tool_loop decorator on
PromptTarget.send_prompt_async (shipped in C4) is the only loop driver.
Every target's _send_prompt_to_target_async returns exactly ONE Message
per call. The decorator stitches iterations into the response list.

**One tool execution layer.** Every dispatched call flows through
ToolBackend.dispatch_async(call) -> envelope. Backends (LocalToolBackend
for Python callables, MCPToolBackend for stdio MCP subprocesses, future
DockerMCPToolBackend, future CompositeToolBackend) are interchangeable
behind a single ABC.

**Migrate OpenAIResponseTarget onto the decorator (new C5).** Delete
the in-class while loop (lines 590-626). _send_prompt_to_target_async
becomes "build body, call API, parse response into one Message, return."
Add _tool_parser returning CanonicalEnvelopeParser (extracts only
function_call pieces; reasoning, mcp_call, web_search_call, etc.
continue to pass through to Memory without dispatch). Translate the
configured backend's schemas into the Responses-API tools shape inside
_construct_request_body (without clobbering an existing
extra_body_parameters["tools"]). Wrap custom_functions as a
LocalToolBackend internally with DeprecationWarning(removed_in="0.16.0"),
preserving the existing fail_on_missing_function semantics.

**Integration tests (new C6).** Rewrite to use the Response target as
the sole OpenAI tool-calling path, plus end-to-end scenario tests
against the real echo_mcp_server.

**OpenAIChatTarget receives no tool-calling support in this PR.** A
future PR can pull Chat onto the same abstractions if anyone still
wants it, but the recommended OpenAI tool-calling path becomes the
Responses API.

# Risks

  * Behavior-parity on the Response target: callers that rely on
    `len(send_prompt_async(...)) == iterations` rather than scanning
    piece types will need updating. Existing function-chaining tests act
    as sentinels.
  * `custom_functions` deprecation must preserve `fail_on_missing_function`
    semantics through the LocalToolBackend wrapper.
  * Response parser must continue to round-trip non-`function_call` piece
    types (reasoning, mcp_call, etc.) to Memory without dispatching.
  * `extra_body_parameters["tools"]` takes precedence over backend-derived
    tools so existing manual configs keep working.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
C6 collapses the Response target in-class agentic loop into the @tool_loop
decorator shipped in C4, and routes tool dispatch through LocalToolBackend
(wrapping the existing custom_functions registry as a deprecation shim).

# What changed

- _send_prompt_to_target_async no longer runs a while loop. It now returns
  exactly one Message per call. The agentic loop is driven by @tool_loop on
  the base class.
- Added _tool_parser returning CanonicalEnvelopeParser from
  pyrit/tools/parsers.py. The parser extracts only function_call pieces;
  reasoning, mcp_call, web_search_call, computer_call, local_shell_call, etc.
  pass through to Memory unchanged because the parser ignores them and the
  decorator exits cleanly on the empty parse.
- Added _tool_schemas() translating the configured backend schemas into the
  Responses-API tools shape.
- _construct_request_body injects tools=... when the backend has schemas.
  User-supplied extra_body_parameters["tools"] takes precedence.
- supports_tool_use=True on _DEFAULT_CONFIGURATION.
- custom_functions= now emits DeprecationWarning(removed_in="0.16.0").
  Internally wraps into a LocalToolBackend. A LocalToolBackend is always
  installed (populated or empty) so legacy target._custom_functions[name]=fn
  mutations keep affecting dispatch via a back-compat property.
- Constructor deep-copies the class-level _DEFAULT_CONFIGURATION before
  mutating it (PromptTarget.get_default_configuration returns the singleton,
  so otherwise one instances tool_backend would leak across every other
  instance).

# What did NOT change

The legacy _find_last_pending_tool_call, _execute_call_section, and
_make_tool_piece helpers remain in place. They are no longer called from
production code, but existing tests still cover them; cleanup is deferred to
the same follow-up PR that removes the custom_functions kwarg after the
0.16.0 deprecation window.

# Tests

- New tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py
  with 7 tests covering deprecation warning, dispatch through user-supplied
  LocalToolBackend, schema injection, extra_body precedence, no-backend
  behavior, and reasoning-only passthrough.
- All 5 existing function-chaining sentinel tests in
  test_openai_response_target_function_chaining.py pass unchanged: the
  back-compat _custom_functions property keeps in-place mutations working.

8131 unit tests green; pre-commit clean (ruff format, ruff check, ty).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
C7 adds end-to-end integration coverage of the @tool_loop decorator,
MCPToolBackend, and MCPClient stack against the real echo_mcp_server
subprocess. Only the OpenAI Responses HTTP layer is mocked; the MCP
stdio subprocess, AsyncExitStack lifecycle, canonical envelope
round-trip, and RedTeamingAttack execution path all run unmocked.

# What ships

tests/integration/tools/test_red_teaming_with_tools.py with three tests:

1. test_red_teaming_response_target_with_mcp_echo - end-to-end smoke
   test. RedTeamingAttack drives OpenAIResponseTarget configured with a
   MCPToolBackend pointing at echo_mcp_server. The Responses API mock
   returns one function_call followed by a stop response. Asserts the
   tool call actually reaches the MCP subprocess and the result lands
   back in the second API call as a function_call_output.

2. test_red_teaming_persists_canonical_transcript_in_memory - verifies
   the canonical envelope contract (plan section 13). Reads the
   conversation back from Memory after attack.execute_async returns
   and asserts the function_call and function_call_output pieces are
   present, in order, with matching call_ids.

3. test_red_teaming_dispatches_all_tool_calls_per_turn - regression
   test for the intentional behavior change from C6. The pre-C6 in-class
   loop in OpenAIResponseTarget only dispatched the LAST function_call
   per turn; the @tool_loop decorator now dispatches every call in
   declaration order. Issues both echo and add in one response and
   asserts both results land in the next API call.

# Test infrastructure

- LocalMCPServerSpec uses command=sys.executable + args=(echo_server,).
- Mock objective scorer returns a true score so RedTeamingAttack exits
  cleanly after one turn.
- Mock adversarial target returns a single scripted prompt wrapped as
  list[Message] (PromptTarget.send_prompt_async contract).
- Score, ComponentIdentifier, and PromptTarget MagicMock(spec=...) usage
  matches the existing tests/unit/executor/attack patterns.

All three integration tests pass; pre-commit clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a parser that walks text MessagePieces for marker-delimited

JSON blocks of the form {"name": ..., "arguments": {...}} and emits

canonical ToolCall instances. Marker pattern, call_id prefix, and

surrounding-text policy (truncate / extract-all / strict) are all

constructor-controlled so a single class covers angle-bracket,

pipe-delimited tag pair, and other chat-template syntaxes.

The parser is the F1 (per plan) piece that lets non-Responses-API

targets participate in PyRIT's @tool_loop without a per-vendor

parser implementation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TargetConfiguration.as_identifier_params() now snapshots the configured

tool_event_policy (behavior + max_tool_iterations) and tool_backend

(backend class + sorted list of advertised tool names). Two targets

that differ only in their tool backend now get distinct identifiers,

which downstream consumers rely on to route by target identity.

Schema serialization is best-effort: backends with shape-quirky schemas

that lack a recoverable 'name' field are silently dropped from the

identifier surface. Exact callables and transports are not serialized

because they are not deterministic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PyRIT's docs build uses MyST, not reStructuredText, so reST roles like

:class:\Foo\ render as literal text in the rendered docs and mismatch

the rest of the codebase. Convert all roles in the new pyrit/tools/

module to plain double-backtick code spans, and drop the in-flight

commit-numbering references (C1/C2/...) that were carry-overs from the

shipping plan and no longer mean anything in source.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aises

Three small cleanups in the new tools test suite:

1. Remove @pytest.mark.asyncio decorators -- the project sets

   asyncio_mode='auto' in pyproject.toml so the marker is a no-op that

   creates the appearance of opt-in async test discovery.

2. Narrow pytest.raises((AttributeError, Exception)) to

   dataclasses.FrozenInstanceError on the two frozen-dataclass guards

   in test_mcp_client.py. The previous pattern matched every Exception

   and would have masked unrelated regressions.

3. Drop in-flight C1/C2/.../C10 commit-id strings from test docstrings;

   they referenced the shipping plan, not the source tree, and read as

   noise after the commits land.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ValbuenaVC ValbuenaVC requested a review from Copilot May 28, 2026 20:15
@ValbuenaVC ValbuenaVC changed the title [DRAFT] FEAT: Tool Use + MCP FEAT: Tool Use + MCP May 28, 2026
@ValbuenaVC ValbuenaVC marked this pull request as ready for review May 28, 2026 20:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@rlundeen2
Copy link
Copy Markdown
Contributor

Cool idea; can we have a design meeting?

@ValbuenaVC ValbuenaVC requested a review from Copilot May 29, 2026 17:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 13 comments.

Comments suppressed due to low confidence (1)

pyrit/prompt_target/openai/openai_response_target.py:23

  • Deprecation warnings should go through pyrit.common.deprecation.print_deprecation_message rather than calling warnings.warn directly, to keep formatting/stacklevel consistent and filterable across the codebase.
import json
import logging
import warnings
from collections.abc import Awaitable, Callable, MutableSequence
from enum import Enum
from typing import (
    Any,
    Literal,
    Optional,
    cast,
)

from openai.types.shared import ReasoningEffort

from pyrit.common.data_url_converter import convert_local_image_to_data_url_async
from pyrit.exceptions import (
    EmptyResponseException,
    PyritException,
    pyrit_target_retry,
)

Comment thread pyrit/tools/models.py
Comment on lines +196 to +205
for _ in range(max_iter):
responses_this_turn = await self._send_prompt_to_target_async(
normalized_conversation=normalized_conversation,
)
all_responses.extend(responses_this_turn)

if parser is None:
return all_responses

last_response = responses_this_turn[-1]
Comment thread pyrit/tools/models.py
Comment on lines +228 to +235
results = await backend.dispatch_all_sequential_async(pending_calls)
tool_msg = _build_function_call_output_message(
reference_piece=last_response.message_pieces[0],
outputs=results,
)
all_responses.append(tool_msg)
normalized_conversation = list(normalized_conversation) + [last_response, tool_msg]

Comment on lines 12 to 16
from pyrit.models.json_response_config import _JsonResponseConfig
from pyrit.prompt_target.common.target_capabilities import CapabilityName, TargetCapabilities
from pyrit.prompt_target.common.target_configuration import TargetConfiguration
from pyrit.tools import ToolCallParser, tool_loop

Comment on lines +208 to +216
if custom_functions:
warnings.warn(
"OpenAIResponseTarget(custom_functions=...) is deprecated and will be "
"removed in 0.16.0. Configure tool_backend on TargetConfiguration "
"instead (e.g. LocalToolBackend(callables=..., schemas=..., "
"fail_on_missing_function=...)).",
DeprecationWarning,
stacklevel=2,
)
Comment on lines +175 to +176
@pytest.mark.asyncio
async def test_red_teaming_response_target_with_mcp_echo(patch_central_database):
class TestToolBackendDispatch:
"""The modern path: pass tool_backend via TargetConfiguration."""

@pytest.mark.asyncio
class TestToolSchemasInjection:
"""_construct_request_body injects backend schemas when present."""

@pytest.mark.asyncio
assert body["tools"][0]["type"] == "function"
assert body["tools"][0]["name"] == "get_weather"

@pytest.mark.asyncio
)
assert body["tools"] == legacy

@pytest.mark.asyncio
must therefore see an empty parse and exit cleanly.
"""

@pytest.mark.asyncio
Victor Valbuena and others added 5 commits May 29, 2026 11:05
…all_output pieces

ChatMessageNormalizer raised on function_call / function_call_output data types, which meant any target whose wire format runs through it (AzureMLChatTarget, HuggingFaceChatTarget, OpenAIChatTarget) could not round-trip a tool-call conversation through @tool_loop.

Adds a per-message tool-message detector that converts function_call pieces to an assistant message with content=null and a ToolCall populated from the canonical envelope, and function_call_output pieces to a role=tool message with tool_call_id set from the envelope's call_id and content set to the output. Matches the OpenAI Chat Completions wire shape.

Also fixes ChatMessage.ToolCall whose 'function' field was typed as a bare string; OpenAI ships it as a nested object with name + arguments. ChatMessage.content now permits None for assistant messages that carry only tool_calls (the OpenAI API requires content=null in that shape).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The base default for _tool_schemas() now reads self.configuration.tool_backend.schemas verbatim. Subclasses that need wire-format wrapping (currently only OpenAIResponseTarget, which prepends type=function) override the method and reuse the base via super() to get the raw schemas.

Removes a small but real duplication risk for the upcoming AzureMLChatTarget / HuggingFaceChatTarget tool-calling paths, which would otherwise each reimplement the 'read schemas from configured backend' boilerplate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AzureMLChatTarget now participates in PyRIT's @tool_loop when callers
supply a ToolCallParser at construction. The parser flips
supports_tool_use=True on the default capabilities so callers don't
need to construct a custom_configuration just to opt in. A convenience
tool_backend kwarg installs the backend onto the configuration in one
step.

Wire format: _tool_schemas() wraps the backend's schemas in the OpenAI
Chat Completions tools shape (with each schema nested under a
"function" key). _construct_http_body_async injects the wrapped
schemas as a top-level tools field when non-empty. Deployments unwrap
that envelope before passing to tokenizer.apply_chat_template; see
plan section 12.9 for the contract.

Response handling: _complete_chat_async now returns the parsed JSON
body (was: string output). The new _materialize_response walks the
response dict and emits one text MessagePiece for the output field
plus one function_call MessagePiece per envelope in the tool_calls
field; CanonicalEnvelopeParser then finds those pieces in the loop's
next iteration.

The no-tools path is unchanged: requests without tool_parser produce
byte-identical request bodies, verified by
test_request_body_omits_tools_key_when_no_backend.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same shape as the AzureMLChatTarget F2 change: callers supply a
ToolCallParser at construction; the parser flips
supports_tool_use=True on the default capabilities so no
custom_configuration is required to opt in. A convenience tool_backend
kwarg installs the backend onto the configuration in one step.

Wire format differs from AzureML because HuggingFace runs the model
in-process via the transformers library:

  * _tool_schemas() returns the bare backend schemas (no OpenAI
    envelope) because tokenizer.apply_chat_template expects bare
    function schemas, not the Chat Completions wrapper.
  * _apply_chat_template forwards tools= into apply_chat_template
    when schemas are present; the model's tool-trained chat template
    renders the model-family-specific tools block (Qwen wraps in
    <tools>...</tools>, Llama uses a system-message preamble, etc.).
  * _build_chat_messages now walks every piece in each message and
    converts function_call / function_call_output envelopes to the
    chat-template tool message shape (assistant + tool_calls list,
    role=tool + tool_call_id) so the model sees the canonical
    in-context tool conversation.

The no-tools path is unchanged: without tool_parser, no tools key is
passed to apply_chat_template and no tool message translation runs.

The user-supplied tool_parser walks the response text for inline
tool-call markers; InlineToolCallParser is the typical choice for
ChatML-style angle-bracket markers, but the user can supply any
ToolCallParser implementation (different marker regex, different
mode).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds tests/integration/tools/test_azure_ml_with_tools_integration.py
exercising the full PyRIT @tool_loop stack against AzureMLChatTarget
with only the HTTP layer mocked. The mocked responses match the §12.9.2
canonical envelope shape: first response carries a tool_calls field
that the loop dispatches via LocalToolBackend; second response is the
final assistant text.

Asserts the canonical four-piece transcript shape persists in Memory:
[user text, assistant function_call, tool function_call_output,
assistant text], with the call_id round-tripping between the
assistant function_call piece and the tool function_call_output piece,
and the tool output reflecting the actual dispatched callable's return
value.

Also covers the no-tools backward-compatibility path: a target
constructed without tool_parser produces a request body that has no
tools key, proving the F2 changes do not regress existing AzureML
deployments that don't carry the patched scoring script.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ValbuenaVC pushed a commit to ValbuenaVC/PyRIT that referenced this pull request May 29, 2026
The previous cleanup commit (31ed2fb) removed the pyrit/tools/ package and tests/unit/tools/ directory, but several tool-calling changes from PR microsoft#1811 (MCP) remained mixed in:

- pyrit/exceptions: ToolCallNotSupported and ToolCallLoopLimitExceeded

- pyrit/prompt_target/common/: @tool_loop decoration on send_prompt_async, supports_tool_use capability, tool_event_policy and tool_backend slots on TargetConfiguration

- pyrit/prompt_target/openai/openai_response_target.py: migration onto @tool_loop + LocalToolBackend (the in-class agentic loop was removed in favor of the decorator)

- tests/integration/tools/ and tests/unit/prompt_target/target/test_openai_response_target_c6_migration.py

- pyproject.toml + uv.lock: mcp Python SDK dependency

All of the above are reverted to origin/main. The adversarial benchmark refactor (this PR's actual scope) is unaffected; 128 targeted unit tests across openai_response_target, function_chaining, and scenario/benchmark still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants